Skip to content

feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352

Open
g-talbot wants to merge 1 commit intogtt/merge-output-split-metadatafrom
gtt/parquet-merge-pipeline
Open

feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids#6352
g-talbot wants to merge 1 commit intogtt/merge-output-split-metadatafrom
gtt/parquet-merge-pipeline

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

@g-talbot g-talbot commented Apr 28, 2026

Summary

Phase 3 (pipeline integration), first PR. Building on Phase 1 (merge engine, #6335) and Phase 2 (merge policy, #6351).

  • merge_parquet_split_metadata() — aggregates input split metadata with MergeOutputFile physical metadata to produce complete ParquetSplitMetadata for merged output. Validates invariant fields (kind, index_uid, partition_id, sort_fields, window), unions metric_names and tags, finalizes tag cardinality after merge. 17 unit tests.
  • Message typesParquetNewSplits, ParquetMergeTask, ParquetMergeScratch for the merge actor chain (planner → scheduler → downloader → executor).
  • replaced_split_ids — added to ParquetSplitBatch and propagated through ParquetUploader (was hardcoded Vec::new()). Enables the merge executor to specify which splits are being replaced during atomic publish-and-replace.

Test plan

  • 17 unit tests for merge_parquet_split_metadata()
  • 4 existing ParquetUploader tests pass with new field
  • cargo clippy clean, cargo doc compiles, license headers OK

🤖 Generated with Claude Code

@g-talbot g-talbot changed the title feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids feat: Phase 3 — Parquet merge pipeline integration (3a–3c) Apr 29, 2026
@g-talbot g-talbot changed the title feat: Phase 3 — Parquet merge pipeline integration (3a–3c) feat: Phase 3 — Parquet merge pipeline integration (3a–3e) Apr 29, 2026
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch from 3227b37 to ceba410 Compare April 29, 2026 12:59
@g-talbot g-talbot changed the title feat: Phase 3 — Parquet merge pipeline integration (3a–3e) feat: Phase 3a–3c — merge metadata, planner, downloader, executor Apr 29, 2026
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch from ceba410 to e96a920 Compare April 29, 2026 14:06
@g-talbot g-talbot changed the title feat: Phase 3a–3c — merge metadata, planner, downloader, executor feat: Phase 3a — merge metadata aggregation, message types, replaced_split_ids Apr 29, 2026
@g-talbot g-talbot changed the base branch from main to gtt/parquet-merge-policy April 29, 2026 14:14
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch from e96a920 to 9926093 Compare April 29, 2026 15:29
@g-talbot g-talbot changed the base branch from gtt/parquet-merge-policy to gtt/merge-output-split-metadata April 29, 2026 15:29
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch 3 times, most recently from acc5099 to 49176b0 Compare April 29, 2026 19:03
@g-talbot g-talbot force-pushed the gtt/merge-output-split-metadata branch from fc6f90a to 720560d Compare April 29, 2026 20:53
…it_ids (Phase 3a)

Phase 3 pipeline integration, first PR:

- merge_parquet_split_metadata(): aggregates input split metadata with
  MergeOutputFile physical metadata to produce complete ParquetSplitMetadata
  for merged output. Validates invariant fields, unions metric_names and tags,
  finalizes tag cardinality after merge. 17 tests.

- ParquetNewSplits, ParquetMergeTask, ParquetMergeScratch message types for
  the merge actor chain (planner → scheduler → downloader → executor).

- Add replaced_split_ids to ParquetSplitBatch and propagate through
  ParquetUploader (was hardcoded Vec::new()). Enables merge executor to
  specify which splits are being replaced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/parquet-merge-pipeline branch from 49176b0 to 17135dc Compare April 29, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant